Bias-Aware Sketches

نویسندگان

  • Jiecao Chen
  • Qin Zhang
چکیده

Count-Sketch [6] and Count-Median [11] are two widely used sketching algorithms for processing large-scale distributed and streaming datasets, such as finding frequent elements, computing frequency moments, performing point queries, etc. The errors of Count-Sketch and Count-Median are expressed in terms of the sum of coordinates of the input vector excluding those largest ones, or, the mass on the tail of the vector. Thus, the precondition for these algorithms to perform well is that the mass on the tail is small, which is, however, not always the case – in many real-world datasets the coordinates of the input vector have a non-zero bias, which will generate a large mass on the tail. In this paper we propose linear sketches that are bias-aware. They can be used as substitutes to Count-Sketch and Count-Median, and achieve strictly better error guarantees. We also demonstrate their practicality by an extensive experimental evaluation on both real and synthetic datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

New cardinality estimation algorithms for HyperLogLog sketches

This paper presents new methods to estimate the cardinalities of multisets recorded by HyperLogLog sketches. A theoretically motivated extension to the original estimator is presented that eliminates the bias for small and large cardinalities. Based on the maximum likelihood principle a second unbiased method is derived together with a robust and efficient numerical algorithm to calculate the e...

متن کامل

Automated Clock Drawing Test through Machine Learning and Geometric Analysis

In this paper, we discuss the challenges of sketch recognition accuracy and automation of the Clock Drawing Test (CDT). Sketch recognition in the context of the CDT is a complex problem due to the lack of knowledge of the preference bias among the sketches drawn by neuro-atypical patients. However, machine learning provides a viable solution to detect measurable patterns among sketches drawn in...

متن کامل

Allocation-Site Aware Shape Analysis and Applications in Hard Real-Time Systems

Shape analysis aims at determining invariants of heapallocated structures that arise during the execution of a program. Current shape analysis techniques are stateless, i.e. they only model the structures arising on the heap and completely ignore their memory locations and where they were allocated. This paper proposes an extended, allocation-site aware shape analysis and briefly sketches field...

متن کامل

The correspondence bias.

The correspondence bias is the tendency to draw inferences about a person's unique and enduring dispositions from behaviors that can be entirely explained by the situations in which they occur. Although this tendency is one of the most fundamental phenomena in social psychology, its causes and consequences remain poorly understood. This article sketches an intellectual history of the correspond...

متن کامل

Context-aware garment modeling from sketches

Modeling of realistic garments is essential for creating believable virtual environments. Sketch-based modeling of garments presents an appealing, easy to use alternative to the established modeling approaches which are time consuming and require significant tailoring expertise. Unfortunately, the results created using existing sketch-based methods lack realism. Driven by human perception of ga...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2017